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Abstract 

Batch codes, introduced by Ishai, Kushilevitz, Ostrovsky and Sahai in pQ, are methods for solving 
the following data storage problem: n data items are to be stored in m servers in such a way that any k 
of the n items can be retrieved by reading at most t items from each server, and that the total number 
of items stored in m servers is N . A Combinatorial batch code (CBC) is a batch code where each data 
item is stored without change, i.e., each stored data item is a copy of one of the n data items. 

One of the basic yet challenging problems is to find optimal CBCs, i.e., CBCs for which total storage 
(N) is minimal for given values of n, m, k, and t. In [2], Paterson, Stinson and Wei exclusively studied 
CBCs and gave constructions of some optimal CBCs. 

In this article, we give a lower bound on the total storage (N) for CBCs. We give explicit construction 
of optimal CBCs for a range of values of n. For a different range of values of n, we give explicit 
construction of optimal and almost optimal CBCs. Our results partly settle an open problem of [2]. 

Keywords : Batch Codes, Hall's Theorem, Binary Constant Weight Codes. 

1 Introduction 

Batch codes, introduced by Ishai, Kushilevitz, Ostrovsky and Sahai in pQ, concerns the problem of dis- 
tributing a database of n items among m servers in such a way that any k of the n items can be retrieved 
by reading at most t items from each server, while keeping the total storage over m servers to N. In [T], 
the authors formalized the notion of a batch code with the following general definition. 

Definition 1.1. fl]/ An (n,N,k,m,t) batch code over an alphabet S is defined by an encoding function 
C : S n — > (S*) m (each output of which is called a bucket) and a decoding algorithm A such that: 

1. The total length of all m buckets is N (where the length of each bucket is independent of x £ T, n ); 

2. For any x G S n and . . . , i^} ^ [n], A(C(x),i%, . . . , i^) = (x^ ,Xi k ), and A probes at most t 
symbols from each bucket in C(x) (whose positions are determined by i±, . . . ,ik ). 
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In the definition, a string of length n (i.e., "x G corresponds to a set of n data items, "m buckets" 
refers to m sets of data items stored in m servers and "total length" refers to total storage (N). As part 
of an encoding algorithm, one can apply any suitable transformation on the data items (e.g., XOR for 
binary data) to be stored, with the condition that the corresponding decoding algorithm should make it 
possible to retrieve any subset (of prespecified size (k)) of data items by reading a limited (t) number of 
items from each server. Apart from the defining parameters (n,N,k,m,t), another important parameter 
is rate. Rate of a batch code is defined as the ratio -ft. 

Since a fixed number of items are read from each server while retrieving a batch of items, batch codes 
can be used for balancing load among servers in a distributed database scenario. It was shown in [1] 
that batch codes can also be used for amortizing computational overhead in private information retrieval 
protocols. 

Of particular interest is the class of batch codes for which encoding is assignment (storage) of items 
to servers and decoding is retrieval (reading) of items from servers. This class of batch codes are called 
replication-based batch codes [1] or combinatorial batch codes (CBC) [2]. One obvious advantage of CBCs 
is that their encoding and decoding do not incur additional computational overhead. On the other hand, 
storage requirement may be more for CBCs. This fact is illustrated by a nice example in the introduction 
of pQ. As combinatorial objects CBCs are quite interesting, and so far they have been studied in various 
combinatorial frameworks. In pQ, Ishai et al. used the framework of unbalanced expanders. In [2j, Paterson 
et al. studied CBCs in the setting of set systems. In [3], Brualdi et al. explored the connection between 
CBCs and transversal matroids. 

Before proceeding further, we mention here that in this article, we will exclusively consider CBCs 
with t = 1 and will not explicitly include this parameter in any expression. For example, we will write 
(n, N, k, m)-CBC to denote an (n, N, k, m, t)-CBC with t = 1. 

An (n, N, k, m)-CBC is called optimal if N is minimal for given n, m, and k. We denote by N(n, m, k) 
value of N for an optimal (n, N, k, m)-CBC. So, for any (n, N 1 , k, m)-CBC it follows that N' > N(n, k, m). 
An interesting and practically important problem is to find optimal CBCs: given n, m, k, the objective is 
to find N(n, m, k) and to give explicit construction of an optimal (n, N(n, k, m), k, m)-CBC. For example, 
if n > m, then it may be trivially observed that N(n, k, m) = n, and for the corresponding optimal CBC, n 
items are stored in any n out of m servers. But for n > m + 1, finding optimal CBCs is a fairly non-trivial 
problem. In [2 J and [3J, this problem was addressed and some partial results were obtained. Next, we 
briefly discuss these results. 

Theorem 1.1 (|2])iV(ra, k, k) = kn - k(k - 1). 

For optimal CBC with the above parameters, items are stored in the following way: 

(i) Any k of the n items are stored in k servers; one item per server. 

(ii) k copies of each of the remaining (n — k) items are stored in k servers; one copy of each item per 
server. 

Theorem 1.2 ([2\)N(m + 1, k, m) = m + k. 

In this case, for optimal CBC, items are stored in the following way: 

(i) Any m of the m + 1 items are stored in m servers; one item per server. 

(ii) k copies of the remaining item are stored in any k of the m servers. 

Note that this construction is given in [3] and differs from the construction given in [2]. 

Theorem 1.3 ([2\) If n > (k - l)( fc m 1 ), then N(n,k,m) = kn - (k - l)^™). 
In this case, optimal CBC is obtained in the following way: 

(i) (k — l)( fc '^ 1 ) of the n items are grouped into groups of k — 1 items. These (JIM groups are 
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stored in combinations k — 1 servers; one group per combination, k — 1 copies of each of the 

k — 1 items of a group are stored in /c — 1 servers of the corresponding combination; one copy of each 
item per server. 

(ii) For the remaining n — (k — items, k copies of each are stored in any k of the m servers. 

Note that in [2], the authors described the constructions in the setting of 11 dual set system! 1 , which we 
will describe in the next section. In [3] and [I], Brualdi et al. obtained the following optimality result for 
n = m + 2 using cocircuit representation of transversal matroids^ 

Theorem 1.4 (|3j)£ei k and m be integers with 2 < k < m. Then 

f m + k - 2 + \2VkTT] if m + 1 - k > \^/kTT] , 

I\ (m + 2, k,m) = < „ „ r ' -, ... _ 1 n r -i 

K ' ' ' \ 2m - 2 + [1 + ;^±L_] «/ m + l-k< \VF+1] . 

In this article, we obtain following new results for optimal CBCs. 

1. By extending a technique of [2J, we obtain a lower bound on N(n, k, m) for values of n in the range 
l<n<(k -l)( fc -). 

2. We give explicit construction of optimal CBCs for values of n in the range ( J^ 2 ) — n — ~~ ■"■) (fc-i) ■ 

3. Using binary constant weight codes, we give explicit construction for the range (J^ 2 ) — (tti — k + 
l)A(m, 4, fc — 3) < n < ( fc ™ 2 ) for fe > 5, where A(m, 4, fc - 3) is the maximum number of codewords of 
a binary constant weight code of length m, weight k — 3 and Hamming distance 4. This construction 
yields optimal CBCs for approximately half of the values of n in this range. For the other half, 
the construction yields almost optimal CBCs; for these CBCs value of iV differs by one from the 
corresponding value of N given by the lower bound that we have obtained. 

Constructions of (2) and (3), which produce optimal CBCs, show that the lower bound of (1) is best 
possible for the corresponding ranges and also settle the problem of finding N(n, k, m) for these ranges - 
partial solution to a problem left open in [2J. 

A c-uniform (n, cn, k, m)-CBC is a CBC where each item is stored in exactly c servers. In [2], the 

ck 

authors gave non-constructive proof of existence of c-uniform (n, cn, k, m)-CBCs, for which n is ). 
Using binary constant weight codes, we provide explicit construction of c-uniform (n, cn, k, m)-CBCs for 
1 < L § J — c < k — 1. For sufficiently large m, c, and k such that c ~ k and c c ~ m, our explicit construction 

is for a value (in asymptotic sense) of n that compares well with the bound Q,(m k - 1 ). 



2 Setting and Preliminaries 

Let C be an (n, N, k, m)-CBC, where x\,X2, ... ,x n are n items and s\, S2, ■ ■ ■ ,s m are m servers. We repre- 
sent C by a set system (5, X), where S = {s\, S2, ■ ■ ■ , s m } is the set of m servers and X = (Xi,X2, ■ ■ ■ , X n ) 
is a collection of n subsets of S, each subset representing one of the n items. If an item Xj is stored in 
servers s^, Sj 2 , . . . , s^, then we represent Xj by a subset Xj, where Xj = {s^, Sj 2 , . . . , s^}, 1 < j < n, 
{ii,i2, •••,«/} ^ [m]. For example, let S = {si,S2,ss} be the set of servers. Let server s\ contain items 
xi,X2, and x%, server S2 contain items x\ and X2, and server S3 contain item X2- Then the collection X is 
({ s 1j s 2}, {si, S2, S3}, {si}), where set {si,^} represents item x\ (since it is stored in servers s\ and S2), 
set {si,S2,S3} represents item X2, and set {s{\ represents item X3. In [2], this setting was referred to as 
dual set system. 

Since C is an (n, N, k, m)-CBC, hence total number of items stored in m servers is N. So, counting in 
terms of number of servers that store a particular item, we have in the setting of set system (S,X) that 
J2 XeX \X\=N. 

*In a very recent work Bujtas and Tuza studied this case in the framework of hypergraphs. See [5] for more details. 
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Now, it may be observed that a set of k items %i x , Xi 2 , . . . , Xi k can be retrieved by reading at most one 
item per server iff there are k distinct servers Sj 17 Sj 2 , . . . , Sj k E S such that Sj r 6 X{ r for 1 < r < k, which 
is same as saying that the collection , Xi 2 , . . . , Xi k ) has a system of distinct representatives (SDR). 
In the previous example, S2,S3,si is an SDR for the collection ({si, S2}, {si, S2, S3}, {si}); items 21,22, £3 
are retrieved by reading x\ from server S2, 22 from server S3, and 23 from server s±. Since C satisfies the 
condition that any k data items can be retrieved by reading at most one item per server, so (S, X) satisfies 
the condition that for any {i\, 12, . . . , ik} Q [ n ], the collection of sets (Xi t ,Xi 2 , . . . ,Xi k ) of Anitas an SDR. 
Hall's Theorem provides necessary and sufficient conditions for existence of an SDR for a collection of sets. 
Below we state the theorem. 

Theorem 2.1 (Hall's Theorem(|8j)). A set system T = {Ax, A2, ■ ■ ■ ,A m } has a set of distinct represen- 
tatives iff 

\\jAi\> \S\ 

ies 

for all S C {1, 2, . . . , m}. 

The necessary and sufficient condition in Hall's Theorem is known as Hall's condition. So, for C to 
satisfy the condition that any k data items can be retrieved by reading at most one item per server, it is 
necessary and sufficient that (S,X) satisfies following restricted form of Hall's condition. 

HClfk]: Given any r sets X^jX^, . . . ,Xi r of X , we have that \ |J > r for all r, 1 < r < k. 

In other words, union of any r sets of X contains at least r elements for 1 < r < k. Equivalently, the 
above condition can also be stated in the following way. 

HC2[k]: Any r element subset of S contains at most r sets of X for all r, < r < k— 1. 

It is sometimes convenient to use the latter of the above two forms of restricted Hall's condition. 
In this article we will use both forms. Next, we introduce following notations to conveniently express 
sub-conditions of HC1[A;] and HC2[/c]. 

(i) For set system (5, X) and a subcollection y C X, we denote by HC1(3^) the condition that | ) J X\ > 

\y\. So, for given k, (S,X) satisfies HCl[fc] iff for all y C X with 1 < |3>| < k, RCl{y) is satisfied. 

(ii) For set system (S,X) and a subset X C S with \X\ = i, we denote by HC2(X) the condition that 
X contains at most i sets of X. So, for given k, (S,X) satisfies HC2[/c] iff for all X C S with 
< \X\ < k - 1, HC2(X) is satisfied. 

Also, to simplify the presentation, we identify a CBC with the corresponding set system. For example, 
we say '(5,^) is an (n, N, k, m)-CBC rather than '(5, X) corresponds to (or represents) an (n, N, k,m)- 
CBC. 

At this point, it is interesting to note that for a set system (S, X) the sub-conditions HC2(AT), IC5, 
are independent, in the sense that for a given subset X of S, satisfaction of every other HC2(Y), Y C S, 
7 / I, does not necessarily imply satisfaction of HC2(X). For example, let S = {si,S2,S3} and let 
X = ({si, S2}, {si, S2}, {si, S2}). Then it can be verified that only HC2({si,S2}) is violated. 

Now, for (n, N, k, m)-CBC (S, X) and for each i, 1 < i < k — 1, we use the method of double- counting 
to 'combine' (™) sub-conditions HC2(X), XC5, |X| = i, into an inequality in the following way: we form 
an (™) x n matrix M ! , whose rows are labeled by all the i-subsets (i.e., i element subsets, for convenience 
we will use this short form) of S, and columns are labeled by the n sets of X. (r, s)-th entry of M % is 1 
if the s-th set of X is contained in the r-th z-subset of S, otherwise it is 0. Let Aj denote the number of 

'In this article, by 'set of a collection' or 'subset of a collection' we will mean set or subset contained in a collection as a 
member. 
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j-sets of X, 1 < j < k — 1. Next, we count the number of Is in M l in two ways. Since HC2(A) is satisfied 
for all IC5, |X| = i, so each row has at most i Is; hence, counting row-wise, there are at most i(?) Is 
in M % . A column labeled by X' G X with 1 < \X'\ = j < i, has exactly ( m Z-) Is, and has none if j > i. 

Hence, counting column- wise, there are exactly E}=i (T-jOA? ^ s m ^ ■ Comparing these two numbers, 
we get the following inequality. 

e(?_7>s<7> i<i<k-i. (2.1, 

So, considering every i in the range 1 < i < k — 1, we get A; — 1 inequalities, each of which is satisfied by 
the (re, iV, k, m)-CBC (5,^). Here it is important to note that k — 1 inequalities, unlike the conditions 
HC2(X), ICcS (from which they are derived), are not mutually independent. In fact, we show in the 
next lemma that if (i + l)-th inequality is satisfied, then i-th inequality is also satisfied. 

Lemma 2.1. For m > 3 and 1 < i <m-2zfZ%\ ( < (* + l)(i+i), then £} =1 (•' /M., < *(?)■ 

Proof. We prove the contrapositive, i.e., we show that if £)}=i (JiZj)-^j > *(?)> then > 

(< + !)(&)■ 

Now, Y.%\ (ZtJAj = A l+l + (m - i) £* =1 " + ^ E ^ ( ' 

Since we assumed Yl)=i \ > *(?)> we nave from the above that 

E£i K£) 4 > + (™ - (?) > (* + 1) G+i) • 

Last step follows from the fact that Aj+i > and (m — i)(?) = (i + l)( i ™ 1 ). □ 
Hence, it follows that if the {k — l)-th inequality, i.e., 



is satisfied, then the remaining k — 2 inequalities are also satisfied. That is, as necessary conditions, the 
other k — 2 inequalities are redundant with respect to 12.21 and excluding these k — 2 inequalities from 
further considerations, we will only use the fact that an (n, N, k, m)-CBC (5,^) satisfies 12.21 as necessary 
condition. Note that in [2J, the authors obtained inequality 12.21 in the proof of Theorem II .31 by considering 
the case of i = k — 1 only. What we have also shown here is that the other k — 2 inequalities, obtained by 
considering HC2(i), 1 < i < k — 2, in a similar way, are redundant; a fact which was not observed in [2J 
and was not very evident in the first place. 

Inequality 12. II is a 'tight' necessary condition for CBCs for wide ranges of values of n, in the sense that 
for these ranges of values of n there are CBCs that just satisfy this inequality (i.e., satisfy with equality), 
and these are the optimal CBCs for the corresponding ranges (e.g., optimal CBCs constructed in [2] for 
n > (k — l)( fc ™ 1 )). In the next section, we implicitly demonstrate this 'tightness' by using [2T21 to obtain 
a lower bound on N(n,k,m) for 1 < n < (k — 1)(JI\), and then constructing optimal CBCs (i.e., CBCs 
that meet this lower bound) for a sub-range of 1 < n < (k — l)( fc ™ 1 ). 

As an immediate corollary of this inequality, we state the following theorem of [2J. 

(fe-i)D 

exists a c-uniform (n,cn,k,m)-CBC. 



Theorem 2.2 (|2j). n(m,c,k) < — fc c , where n(m,c,k) is the maximum value of n such that there 
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c = 1, k — 2, and k — 1 are the only values, known so far, for which there exists c-uniform (n, cn, k, m)- 
CBCs with n given by the expression of Theorem 12,21 



3 Our Results 

3.1 Lower bound on N(n, k, m) for 1 < n < (k - 1) ( fc ^) 

Here we obtain a lower bound on N(n,k,m) for 1 < n < (k — The proof is divided into three 

steps. First, we prove a useful inequality in Lemma 13.11 In Lemma 13.21 we use inequality 12.11 to get a 

relationship between N and — , fc \ c - for any c such that 1 < c < k — 1. From this relationship we get 

V c / 

different estimates of lower bound on N(n,k,m) for different values of c, 1 < c < k — 1. Our use of 12.11 
is same as in the proof of Theorem 11.31 of [2\. There, the authors obtained lower bound on N(n,k,m) for 

n > (k — l)( fc ™ 1 ); they set-up relationship between ./V and (fc — l)( fc ™ 1 ) (value of — p=if at c = A: — 1). 

Here we generalize (using Lemma |3.1|) their approach for any c in the stated range. Finally, in Theorem 
13.11 we find c that gives best estimate for lower bound on N(n, k, to). 



m- 



K-i—i) (m — k + l)(c 

Lemma 3.1. Let 1 < c < k < m and < i < k — 1. Then , — - . — 1 > — 



m-c \ — u 
Kk-i-c) K 



Proof. Notice that both sides are equal for i = c and i = c — 1, and both sides decrease as i goes from 

to k — 1. Hence, it is sufficient to show that difference between l.h.s. values for i — 1 and i is greater than 

to — k + 1 m — k + 1 

or equal to — for 2 < % < c — 1, and is less than or equal to — for c+l<i<k — 1. 

A; — c fc — c 

/m— / m—i \ m—k+1 / m—i \ m—k+1 fm—i\ 

Tvr nw V k—i / \k—i—l) _ k—i \k—i—l) _ k—c \k—i! 

' / m—c \ / m—c \ / m—c \ /m—c\ 

U-c-1/ U-c-1/ U-c-1/ \k-c) 

Here we note that when c> i, we have (^) = (— ) (— ). 

And when i > c, we have (— ) = (—) ) . 

In the above two cases we have used the identity (^) = (^) (yZ z z )> for x > y > z > 0. 

' m—k+1 (m—i\ 



T^T (T-/) ^ m-k+1 

m—k+1 (m—i 



tt fc— c V/c— 1/ 

Hence, 



(T-- C c ) 



> — : when c > i, 

k i\ k — c 



m-k+l (k-c\ . 

k-c U-c) / m — k + 1 



□ 



< — when i > c . 



(k—l)( m ) 

Let us use the notation U m ^ tC for the expression — ik-i\ ■ Note that U m: k, c may not be an integer for 

V c / 

given values of to, c, and /c and that U m ^ fi < U m ^,c' f° r c < c '- 

Lemma 3.2. Let (S,X) be an (n, N,k,m)-CBC and 1 < c < k - 1. T/ien N > nc — (fc ~ c ^+i c ~ n) + 
^^^Tj-^Afc, where is the number of k- sets of X. 

Proof. Since (5, A 7 ) satisfies HCl[/c], hence it is sufficient for a set of X to be of size k, in the sense that if 
a subcollection 3^ Q X with 1 < |3^| < k contains a fc-set, then it always satisfies HC1(3^). Hence, without 
loss of generality, we assume that each set of X is of cardinality at most k. Let Ai be the number of i-sets 
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of X. Then we have the following equation. 



5> 

i=i 



n. 



As a necessary condition (5, X) also satisfies the following inequality. 

fc-i 



i=l 



m 
k-1 



Dividing both sides of 13. 21 by and then subtracting 13.11 we get 



A — l ^ m—i \ 



^ V / m— c \ 
1 U-c-1/ 



Employing Lemma 13.11 to 13.31 we get 



Now, 



k-1 



c){U m ,k,c + A k -n) 
m — k + 1 



(3.1) 



(3.2) 



(3.3) 



(3.4) 



Using [331 we get 



it fc 
nc 

i=i i=i 



(k-c)(U m>kjC -n) (k-c)(m-k) 
Jy > nc ; ; 1 ; — — — A k . 



m — k + 1 



m — k + 1 



(3.5) 

(3.6) 
□ 



Theorem 3.1. Let 1 < n < (k — l)( fc ™ 1 ), 1 < c < k — 1, and c be the least integer such that n < U m ^, c - 



Then nc ■ 



(k-c)(U mtktC -n) 
m—k+1 



is a lower bound for N(n, k, m). 



Proof. Let (S,X) be an optimal (n, N, k, m)-CBC (i.e., for it N = N(n,k,m)) and A k be the num- 
ber of /c-sets of X. Then for any i, 1 < i < k — 1, we have from relation 13.61 of Lemma 13.21 that 

N(n, k, m)>ni- ^fXi'^ + ^-fci^ ' Since A * ^ °> we have N ( n > k > ™) > ni - ^"S+T" • 
Let us denote ni 
1 < i < k — 2, we have 



(fc-t)(£/ m ,fc,i-re) 
m—k+1 



by b(n,k,m,i). Next, we find i which maximizes b(n,k,m,i). For 



b(n, k, m, i) — b(n, k,m,i + 1) 



. (k-i)(U mki -n) (k-i-l)(U mki+ i-n) 
ni ; : n{i + 1) H 



m — k + 1 

i — l)(U m:kj i + i — Urn t k,i) 



m — k+1 



U, 



m,k,i 



n 



m — k + 1 

U m ,k,i fa 

m — k + 1 
(m - k)(U mAi - n) 
m — k+1 



U m ,k,i fa 

m — k + 1 



n 



using U m k)i+1 - U m>k , 



(m-k + l)U m ,k,i 
k-i-1 
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Let c be such that U m , k ,i < Ef m ,fc,2 < ... <n< U m ,k,c < t^fc.c+i ■ ■ ■ < U m ,k,k-i, 1 < c< k — 1, then using 
above relation we have b(n, k, m, 1) < b(n, k, m, 2) . . . < b(n, k, m, c) > b(n, k, m, c+1) . . . > b(n, k, m, k— 1). 

Since N(n, k, m) is an integer, we have N(n, k, m) > \b(n, k, m, c)] > nc 

is a lower bound on N(n,k, m). □ 



(k-c)(U m ^ k , c -n) 
m—k+l 



and nc 



(fc-c)(E/ m ,fc, c -n) 
m— fc+l 



3.2 Construction of optimal CBCs for the range ( fc m 2 ) < n < (k - l)( fc m 1 ) 

Let 5 be the set of servers, where |<S| = m. Avoiding the trivial case of k = 2, for which the range becomes 
1 < n < m, we consider cases where m > k > 3. Roughly, the construction is as follows. We start with 
a CBC (S,Xi), in which X is a collection of (k — l)-subsets of S. We also take an auxiliary collection X a 
of distinct (k — 2)-subsetdjj of S. From Xi we systematically delete (k — l)-sets and add to it (k — 2)-sets 
from X a to get the final collection X. Below we describe the construction in more detail. 
Construction. As initial collection, we take the collection Xi of sets of an optimal (n, N, k, m)-CBC 
(S,Xi), where n = U m ^^-i = (k — l)( fc ™ 1 )H. In this collection, there are k — 1 copies of each of the 
(k — l)-subsets of S. 

For the CBC to be constructed, we have U m ^,k-2 < n < U m ^,k-i- Hence, < U m ^,k-i — n < 
(m — k + 1) ( fc ™ 2 ) • The auxiliary collection X a contains any ^ m,ic ' fc ~ 1 

is possible given the range of values of U m k k-i 



m—k+l 

n. Next, we do the following 



distinct (k — 2)-subsets of S. This 

^m.fe. fc — 1 " 



m—k+1 



times. 



1. Select a (k — 2)-set from X a and delete one copy of each of its m — k + 2 supersets from X%. For each 
selected (k — 2)-set of X a , we can always delete one copy of each of its m — k + 2 supersets from Xi 
irrespective of previous deletions. This is because there are k — 1 copies of each of the (k — l)-subsets 
of S in the initial collection Xi. So, for a (k — l)-subset of Xi, its k — 1 copies may be assumed to be 
assigned to its k — 1 distinct (k — 2)-subsets; one copy per subset. Therefore, for a (k — 2)-set of X a 
there corresponds a copy of each of its m — k + 2 supersets in X^. 

2. Add the (k — 2)-set to the collection Xi and delete it from the auxiliary collection X a . 

n), then for the remaining (k — 2)-set of X a , delete one copy of each of 



Finally, if (m - k + 1) \ {U mjk>k -i 
its (U, ' u m,^-i-n 



rn,k,k— 1 



n 



[m 



+ supersets from Xi. 



m—k+l 

In the end, we get the final collection X of n subsets of S. Before proving its correctness, we give an 
example to illustrate the construction. 



Example 3.2.1. Let us take m = 6, k = 
and the initial collection Xj contains k — 1 



4, n = 43 and S 
= 3 copies of each of the 20 3-subsets of S. 



{si,s 2 ,s 3 ,S4,s 5 ,s e }. Hence, U m ,k,k-\ = 60, 
r Um ' k,h ul7 n ] = 6, and 

m—k+l 

the auxiliary collection X a contains 6 2-subsets of S; let it be ({si, S2}, {S2, S3}, {S3, S4}, {«4, S5}, {S5, sq}, {s\, s 
For step (i), we select subset {s\, S2} of X a , delete one copy of each of its m — k + 2 = 4 supersets (i.e., 

{si, S2, S3}, {si, S2, S4}, {si, S2, S5}, {si, S2, sq}) from Xf, add the subset {si,S2} to Xi and delete it from 

X a . We repeat these steps for 4 other subsets (let us take {s2, S3}, {S3, S4}, {S4, S5}, {S5, sq}) of X a . 

Finally, for the remaining subset {si, sq}, we delete two of its supersets {si, S2, sq} and {si, S3, sq} from 

collection Afj. Table Q] shows the final collection X. 



they are distinct as subsets of <S. For the rest of 



•'"Here 'distinct' means that the subsets contain different elements, i.e. 
this article, this interpretation will be assumed. 

§So far, there is only one optimal collection known for n = U m ,k,k-i', hence, we write 'the collection' here. We do the same 
in the next construction for n = U m k k-2- 



s 



Table 1: Final collection X of Example 13.2.11 



Subset 


Number of copies 


Subset 


Number of copies 


{s\, S2, S3} 


1 


{So, 53, Sfil 


2 


{si, S2, S4} 


2 


{so, 5a, Sr} 


3 


{s\, S2, S5} 


2 


{52, 55, Sfi} 


3 


{si,s 2 ,s 6 } 


2 


{53,54,55} 


1 


{S1,S 3 ,S4} 


2 


{S3,5 4 ,5 6 } 


2 


{si,s 3) s 5 } 


3 


{53,5 5 ,5 6 } 


2 


{S1,S3,S 6 } 


2 


{S4,5 5 ,S 6 } 


1 


{51,54,55} 


2 


{51, 5 2 } 


1 


{si,s 4 ,s 6 } 


3 


{52,53} 


1 


{•51,55,56} 


2 


{S3,5 4 } 


1 


{S2,5 3 ,S 4 } 


1 


{54,55} 


1 


{52,53,55} 


2 


{55, 5 6 } 


1 



Proof of correctness. Note that sets of X are of cardinality k — 1 and k — 2, and the (A; — 2)-sets are 
distinct. In order to prove correctness of the construction, we show that (S,X) satisfies HCl[/c], i.e., for 
all subcollections y C X with 1 < |3>| < k, HC1(3>) is satisfied. 

1. HCl(y) for y C AT with 1 < \y\ < k - 1: For any subcollection y C X with 1 < \y\ < k - 2, 
HC1(3^) is trivially satisfied. Since union of two distinct (k — 2)-sets contains at least k — 1 elements 
and we are considering cases where k— 1 > 2, it follows that HC1(3^) is also satisfied for a subcollection 
y C X with \y\ = k- 1. 

2. HCl(y) for 3; C X with |3>| = fc: Consider any subcollection y C A" such that = fc. One of 
the following applies for 3^- 

(a) All the sets of 3^ are {k — 2)-sets. Since a (k — l)-set contains k — 1 distinct (k — 2)-subsets, 
therefore union of k distinct (k — 2)-sets contains at least k elements. So, HC1(3^) is satisfied. 

(b) y has one or more copies of a (k — l)-set X. In this case, observe that y has a set Y such that 
Y C X. This is because, in the initial collection Xi there are k — 1 copies of X (including itself), 
and during construction a subset of X is added to Xi after deleting one copy of X. So, in the 
final collection X, there are exactly k — 2 other sets X' such that X' C X. 

So, we have \X U Y] > k. Hence, HC1(3^) is satisfied. □ 

So, (S,X) is an (n, N, k, m)-CBC, where N = J2xex\ x \ = n ( k ~ x ) ~ ^m-fc+i"" • Hence ; following 
Theorem l3.lt it is an optimal CBC. Therefore, we have proved the following. 



Theorem 3.2. @ Let ( fc ™ 2 ) < n < (jfe - 1) (^J . Tften iV(n, fe, m) = T»(Jfc - 1) 



^In a very recent and independent work Bujtas and Tuza also proved this result. See [B] for this and more results. 
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3.3 Construction of optimal and almost optimal CBCs for the range ( k m 2 ) — (m — k + 
l)A{m,A, k — 3) < n < ( fc m 2 ) for k > 5: Construction using Binary Constant Weight 
Codes 

Before describing the construction, we briefly state relevant results of binary constant weight codes. See 
[7] or any standard text on coding theory for basic details and terminology related to codes. 

Let S be an /-set {si, s%, . . . , si}. For / C S, characteristic vector of / is the vector xi = (ci, C2, • . ■ , q) 6 
F 2 such that q = 1 iff Sj G /, 1 < i < Z, where F2 is the finite field of order 2. So, a subset of a set can be 
naturally identified with its characteristic vector. 

A binary constant weight code is a nonlinear code over ¥2, whose every codeword has same weight. In 
order to apply the results of binary constant weight codes into the setting of set system, we view codewords 
as characteristic vectors of subsets. A w-subset of an Z-set is identified with a codeword of length Z and 
weight w, where the codeword is the characteristic vector of the subset. Thus, if distance between two 
codewords is d, then symmetric difference between the two corresponding subsets is also d; we say that 
such a pair of subsets is d distance apart. Here it may be observed that if two sets have same cardinality, 
then cardinality of their symmetric difference is even; or equivalently, distance between two codewords of 
same weight is even, d 

In the context of CBCs, we are concerned with construction and bound, especially lower bound, on 
the size of binary constant weight codes. Construction of binary constant weight codes, although very 
interesting, will not be discussed. But to give an approximate idea of the range of values n covered by our 
construction, we state two general lower bounds obtained in 

Let A(n, 2d, w) denote maximum number of codewords of a binary constant weight code of length n, 
weight w and minimum distance 2d over field F2. In [llj . Graham and Sloane gave an elegant construction 
of binary constant weight codes with minimum distance 4, which leads to the following lower bound on 
the maximum number of codewords. 

Theorem 3.3 (llll). A(n,4,w) > - ( 

n \w 

They gave another construction for arbitrary d, and hence the following lower bound. 

Theorem 3.4 QllJ). Let q be a prime power such that q > n. Then 
A(n,2d, W )>-^(^y 

This, along with Johnson's upper bound on the size of binary constant weight codes, results in the 
following asymptotic estimate of A(n,2d,w). 

n (w-d+l) u _ l)\ n ( w ~ d+1 ') 

Theorem 3.5 (llll ). ; < A(n, 2d, w) < ; , for w fixed as n — ^ 00. 

wl wl 

n (w-d+l) 

Remark 3.3.1. For d = 2 the above implies A(n,4,w) ~ . 



wl 



There have been various improvements (in specific cases) of the bounds in Theorem 13.31 and Theorem 
3.41 For example, in |llj . the authors themselves gave a construction based on sets with distinct sums, 
and the lower bound, thus obtained, is better than that of Theorem 13.41 for lower values of n. For related 
results and improvements of the bound in Theorem 13.31 see |13j.|14j. and a more recent [12] (and references 



"Due to this, distance between two codewords of a binary constant weight code is commonly given in terms of 2d rather 
than d. 
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given there). For various constructions and bounds of constant weight codes see [9], |10j . and [15] (and 
references given there). 

Construction. Our overall construction procedure for this range is similar to our previous construction; 
although with a different initial collection (Xi) and auxiliary collection (X a ). As in the previous case, we 
take S to be the set of servers, |5| = m. We take the initial collection to be the collection Xi of sets 
of an optimal (n, N, k, m)-CBC (S,Xi), where n = U m ^,k-2 = (^2)' This collection consists of all the 
(k — 2)-subsets of S. 

For the CBC to be constructed, we have fm,fc,fc— 2 — {m — k + l)A(m, 4, k — 3) < n < U m ^,k-2- Hence, 
< L^ m ,fc,fc-2 — n < (m — k + l)A(m, 4, k — 3). However, unlike in the previous construction, choice of 

auxiliary collection (X a ) of sets is not arbitrary. In this case, we take X a to be a collection of -^^feri - 

distinct (k — 3)-subsets (of S) which are mutually minimum 4 distance apart. Note that choice of the 
(k — 3)-sets of X a is guided by codewords of corresponding binary constant weight codes, which is possible 

Um,k,k — 2~ 



for the range of values of U m k k-2 ~ n - Next, we do the following 



m— fc+1 



times. 



1. Select a (k — 3)-set from X a and delete each of its m — k + 3 supersets from Xi. This can be done 
for each (k — 3)-set of X a irrespective of all the previous deletions. This is because, (k — 3)-sets of 
X a are mutually minimum 4 distance apart. Hence, union of any two subsets of X a contains at least 
k — 1 elements. So two (k — 3)-sets of X a can not have same (k — 2)-superset in Xi. 

2. Delete the (k — 3)-set from X a and add two copies of the set to Xi. 

Finally, if (m — k + 1) \ (J7 m ,fc,fc-2 ~~ n ), then for the remaining (k — 3)-set of X a , delete its (JU mi k,k-2 — n) — 
Vm m-kl\ n {m — k + l) supersets from Xi. 
In the end, we get the final collection X of n subsets of S. 
Proof of correctness. Note that the sets of the final collection X are of cardinality k — 2 and k — 3, and 
there are exactly two copies of a (k — 3)-set and the (k — 2)-sets are all distinct. We prove correctness 
of the construction by showing that (5, X) satisfies HC2[/c], i.e., HC2(X) is satisfied for all X C S with 
< |X| < k- 1 . 

1. HC2(X) for X C S with < \X\ < k-2: HC2(X) is trivially satisfied for all X C S with 
< \X\ < k — 4. Since there are exactly two copies of a {k — 3)-set in X , HC2(X) is also satisfied 
for all X C S with \X\ = k — 3, for k > 5. Now, two (k — 2)-sets of X a are at least distance 4 apart, 
hence their union contains at least k — 1 elements. Following the construction, union of a (k — 3)-set 
and a (k — 2)-set of X contains at least k — 1 elements. Hence, union of more than two sets of X 
contains at least k — 1 elements. Therefore, HC2(X) is satisfied for all X C S with |X| = k — 2, for 
k > 5. 

2. HC2(X) for X C S with \X\ = k - 1: Let there be X C S with \X\ = k - 1 such that HC2(X) is 
violated. Let us assume that r sets of X are contained in X, where r > k. Let among those r sets 
U\, U2, ■ ■ ■ , U q are (A; — 2)-sets, and V\, V2, ■ ■ ■ , V T -q are (k — 3)-sets, for some q < r. 

For each Vi, 1 < i < r - q, let Ui = {W \ Vi C W C X}. That is, Ui is a set of (k - 2)-subsets of X 
that contain Vi. Now, observe the following. 

(a) \Hi\ = 2,1 <i < r - q. 

(b) Ui tfi Tlj, \ < i < q, \ < j < r — q. This is because, following construction, Vj ^ Ui, 1 < i < 
qA < 3 <r - q. 
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(c) Hi n Hj = if Vi 7^ Vj, 1 < i < j < r — q. As noted earlier in construction step 1, this follows 
from our choice of auxiliary collection X a ; no two distinct {k — 3)-sets of X have the same 
{k — 2)-superset. 

Since there are 2 copies of each {k— 3)-subset in the final collection X, there are at least [^p] distinct 
Vi s in X. Hence, there are at least disjoint Hi s. Thus, number of distinct (k — 2)-subsets of 

X is at least q + 2[^rp] > r > k, which is not possible since \X\ = k — 1. 

Thus X satisfies HC2(X). □ 



So, (5, X) is an (n, N, k, m)-CBC, where N = J2xex\ x \ = n{k-2)-2 
Regarding optimality of the construction, we note that lower bound on N(n,k,m), given by Theorem 



m—k+l 



13.11 for this range of values of re is n(k — 2) ■ 



2([/„ 



-n) 



. Hence, difference between value of N obtained 



m—k+l 

from our construction and value of N given by the lower bound that we have obtained, for given values of 
n,m, and k is 



n(k 



when 

1 when 



m, k, k — 2 

m—k+l 

< {U m> k,k-2 
m—k+l Itt 
1 — \ u i 



n 



n(k - 2) - 
mod (m 



2(^m,fc,fc-2~") 



m—k+l 



k + l)< 



m—k+l 



m,k,k—2 



n) 



mod (m — k + 1) < m — k + 1. 



So, the construction yields optimal CBCs for approximately half of the values of n in this range; for the 
other half, value of iV for constructed CBC differs by one from the corresponding lower bound that we 
have obtained. Hence, we have the following theorem. 



Theorem 3.6. Let (^ 2 )-(m-k+l)A{m,A,k-3) <n< ( fc m 2 ). ThenN(n,k,m 



n{ 



-2)- 



2(( fc - 2 )~") 



m—k+l 



m—k+l 



for < (( fc ™ 2 ) - n) mod {m — k + l) < — - 
((fe-a) ~ n ) mod {m-k + 1) < m—k + l. 



and N{n,k,m) < n{k 



-k+l 



for 



-k+l < 
2 — 



3.4 Construction of c-uniform CBCs for 1 < [|J < c < k — 1 

In this section, we give explicit construction of a c-uniform (re, cn, k, m)-CBC (5, X) for given values of k 
and m, where 1 < |_|J < c < A; — 1 and n = (k — c — l)A{m, 2{k — c — 1), c). 

Construction. Let S be the set of servers, where \S\ = m. X is a collection of c-subsets of S, each having 
k — c — 1 copies, where distinct c-subsets correspond to codewords of a binary constant weight code of 
length m, weight c, and distance 2{k — c — 1). So, distinct c-sets of X are mutually minimum 2{k — c — 1) 
distance apart. Note that for 1 < [_§J < c < A; — 1, we have 1<&; — c — 1 < L^-J < L|J ^ c - 

This construction yields c-uniform (re, cn, k, m)-CBC (5,^), where re = = {k — c — l)A{m, 2{k — 
c — l),c). Below we prove correctness of the construction. 

Proof of correctness. We prove correctness of this construction by showing that {S, X) satisfies HC2[fe], 
i.e., HC2(X) is satisfied for all X C S with < \X\ < k - 1. 

1. UC2{X) for X C S with < \X\ < k - 2: HC2(X) is trivially satisfied for all X C S with 
< \X\ < c — 1. Since there are k — c — 1 copies of each c-set and c > k — c — 1, HC2(X) is also 
satisfied for all X C S with \X\ = c. Now, union of two distinct c-sets of X contains exactly k — 1 
elements. Hence HC2{X) is satisfied for all X C S with c+ 1 < \X\ < k — 2. 

2. HC2(A") for X C S with \X\=k— 1: For any X C S such that \X\ = k - 1, one of the following 
applies. 
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(a) X contains at most two distinct c-sets of X. In this case, X contains at most 2{k — c — 1) < 
2[^^J < k — 1 c-sets of X. Hence HC2(X) is satisfied for this case. 

(b) X contains at least 3 distinct c sets of X . Then we have 

i. for any two distinct Xi, X2 £ X such that Xi, X2 Q X, it follows that l-XiU-X^I = \X\ = k—1 
and \X\ \ X 2 \ = k — c — 1; 

ii. for any three distinct X h X 2 , X 3 G X such that X X ,X 2 , X 3 C X, (Xi \ X 2 ) n (Xi \ X 3 ) = 0. 
For, if for some s € X, s £ (Xi\X 2 )n(Xi\X 3 ), then s £ (X 2 UX 3 ). Hence, |X 2 UX 3 | < k-2, 
a contradiction. 

Hence, given Y £ X such that 7 CI, there can be at most [ k _'i_ 1 J other distinct X' such that 
X' £ X,X' <Z X, and |X'\Y| = k-c-1. Hence, there are at most (k-c- 1) Lf^iJ +&~c- 1 < 
fc — 1 c-sets of contained in X. Hence, HC2(X) is satisfied for this case. 

So, HC2(X) is satisfied for all X C S with \X\=k — 1. □ 

In order to get an idea of the value of n, i.e., size of X, we use asymptotic bound given in Theorem 13.51 
which shows n > h as m — )• 00. In [2], the authors gave non-constructive proof of existence of 

ck 

c-uniform (n, cn, k, m)-CBCs, for which n is ^(mfc- 1 ); for sufficiently large m,c, and k such that c ~ k 
and c c ~ m, value of n (in asymptotic sense) obtained from our explicit construction compares well with 
this value. 
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